Synopsis: Pseudo Key Neat-Freak

Let’s get introduced to the issues that arise because of having gaps in the data.

Imagine the following scenario. Your manager approaches you, holding two report printouts. “The bean counters are saying we have discrepancies between this quarter’s report and the last quarter’s report. I’m looking at them, and they’re absolutely right. Most of the later assets have disappeared. What happened?”

You look at the reports, and the pattern of discrepancies rings a bell. “No, everything is still there. You asked me to clean up the rows in the database, so there are no missing rows. You said the accountants kept asking you questions about missing assets because of gaps in the numbering. So, I renumbered some of the rows to make them all fit into the places where there were missing rows before. There aren’t any missing rows now — every number between 1 and about 12,340 is used. They’re all still there, but some have just changed the number and moved up. You told me to do this.”

Your manager shakes their head. “But that’s not what I want. The accountants have to track depreciation by the asset numbers. The number for each piece of equipment has to stay the same in each quarterly report. Besides, all the asset ID numbers are printed on labels on each piece. It’d take weeks to relabel everything in the company. Can you please change all the ID numbers back to their original values?”

You want to be cooperative, so you turn back to your keyboard to start working, but suddenly you think of a new problem. “What about new assets we bought this month after I consolidated the asset IDs? The new assets have been assigned ID values that were in use before I did the renumbering. If I change the asset IDs back to their old values, what should I do about the duplicates?”

Objective: Tidy up the data#

There’s a certain type of person who is unnerved by a gap in a series.

bug_id status product_name
1 OPEN Open RoundFile
2 FIXED ReConsider
4 OPEN ReConsider

On the one hand, it’s understandable to be concerned because it’s unclear what happened to the row with bug_id 3. Why didn’t the query return that bug? Did the database lose it? What was in that bug? Was the bug reported by one of our important customers? Will we be held responsible for the lost data?

The objective of one who practices the Pseudokey Neat-Freak antipattern is to resolve these troubling questions. This person is accountable for data integrity issues, but often they don’t have enough understanding of or confidence in the database technology to feel confident of the report results.

Legitimate uses of the antipattern#

The pseudo key is a unique identifier in the database. When a pseudo key generates a new value, the value is greater than the last value it generated, not the highest value currently in the table, as some database programmers assume. Assume that we need to delete a member because they have had to leave the company for breaking some rules, and we decide to reallocate their primary key to a new person. But now, the new member who gets assigned that primary key is liable to catch the blame of their predecessor, which is, of course, unthinkable.

There’s no reason to change the value of a pseudo key since the value should have no significance anyway. If the values in the primary key column carry some meaning, this column is a natural key, not a pseudo key. It’s not unusual to change values in a natural key.

Solution: Trust No One
Antipattern: Filling in the Corners
Mark as Completed
Report an Issue